Add Web Audio priming/blessing #3974

compulim · 2021-07-06T18:02:31Z

Fixes #2316. Fixes #3823. Fixes #3899.

Changelog Entry

Added

Resolves #2316. Added blessing/priming of AudioContext when clicking on microphone button, by @compulim, in PR #3974

Fixed

Fixes #3823 and #3899. Fix speech recognition and synthesis on Safari, in PR #3974

Description

Instead of using microphone input source from Speech SDK (a.k.a. AudioConfig.fromMicrophone), we are implementing our own audio input source. For a few reasons:

In Safari, AudioContext need to be primed/blessed
- Blessing means we call AudioContext.resume() from code that is initiated from user gestures, such as clicking on the microphone button
AudioContext should be reused to keep its blessing status

However, the adapter provided by Speech SDK:

Will not keep the instance of AudioContext, always close after use
May not call AudioContext.resume() from a user gesture event

This means, in Safari, it may occasionally lose access to microphone.

We also added new internal hook useResumeAudioContext. So we can call the hook to from time to time to bless the AudioContext continuously.

Design

We are based from Speech SDK AudioConfig.fromStreamInput and Push/PullAudioInputStream.

There are few caveats with Push/PullAudioInputStream:

PushAudioInputStream
- We can call write() occasionally to push buffer to it
- Does not support more than one recognition, will throw an error
- Does not signal us when the input stream is tearing down by the speech recognition engine (such as, when the recognition is completed)
PullAudioInputStream
- We provide a callback function, which the PullAudioInputStream will call continuously to pull buffer from us
- It will continuously call the callback function to pull in more data
- The callback function must be synchronous
- That means, before the PullAudioInputStream is set up, all data must be ready synchronously
- Microphone input are not synchronous and data are not ready at t=0

Instead of basing off Push/PullAudioInputStream, we are basing off from their base class AudioInputStream instead. But there are lot of quirks in their original implementation.

attach is called, but detach is never called
turnOff is called, but turnOn is never called
close is marked as abstract (must implement), but never called

We wrapped Speech SDK AudioInputStream implementation, cleaning up quirks, and internally expose as createAudioConfig, which only requires 1 callback:

attach, which returns a Promise of:
- Streams of ArrayBuffer chunks
- Audio format (e.g. 16-bit mono 16 kHz)
- Device info (manufacturer, model, connectivity type, device type)
turnOff (optional), which will be called before the device is tearing down
- Does not call when used in Direct Line Speech

We use our createAudioConfig in 2 different areas to proof its efficiency: real microphone input, and mocked input in test harness.

In the future, when we bump Speech SDK, we will need to make sure our createAudioConfig will continue to work with their latest version.

Specific Changes

Added new createAudioConfig for implementing custom AudioConfig for Speech SDK without understanding its complexity
When using Cognitive Services or Direct Line Speech and audioConfig is not passed, we will use our custom AudioConfig for microphone input
- When using botframework-directlinespeech-sdk alone without Web Chat, it will NOT use our custom AudioConfig
Added new mock microphone input in test harness to use the new createAudioConfig
Added internal useResumeAudioContext hook to continuously bless AudioContext object
Bless AudioContext object when pointerdown event is received from window object

I have added tests and executed them locally
I have updated CHANGELOG.md
I have updated documentation

Review Checklist

This section is for contributors to review your work.

~~Accessibility reviewed (tab order, content readability, alt text, color contrast)~~
Browser and platform compatibilities reviewed
~~CSS styles reviewed (minimal rules, no z-index)~~
~~Documents reviewed (docs, samples, live demo)~~
~~Internationalization reviewed (strings, unit formatting)~~
package.json and package-lock.json reviewed
~~Security reviewed (no data URIs, check for nonce leak)~~
Tests reviewed (coverage, legitimacy)

packages/bundle/src/speech/createAudioConfig.ts

compulim added 12 commits July 5, 2021 01:29

Add AudioContext priming

4515654

Fix build break

bbb182c

Update comment

85bcc60

Add resumeAudioContext

452c583

Add descriptions

146dab3

Move resumeAudioContext to Dictation

1c4f5d6

Support Direct Line Speech

f543d9a

Clean up

90848be

Add tests

8cfa475

Update comment

837a9ec

Add comments

529424e

Add entries

135e47e

compulim marked this pull request as ready for review July 6, 2021 18:25

compulim requested review from a-b-r-o-w-n, beyackle, cwhitten, srinaath, tdurnford and tonyanziano as code owners July 6, 2021 18:26

Clean up

43d3706

This was referenced Jul 6, 2021

[Speech] Build our own PcmRecorder #3975

Open

[Speech] Sample for building custom microphone input #3976

Open

Update comment

f0c2b3a

compulim assigned cwhitten Jul 6, 2021

cwhitten reviewed Jul 6, 2021

View reviewed changes

packages/bundle/src/speech/createAudioConfig.ts Show resolved Hide resolved

cwhitten approved these changes Jul 6, 2021

View reviewed changes

compulim added 4 commits July 6, 2021 12:55

Fix tests

e3fbd6c

Apply PR suggestions

2976141

Fix type check

203bcdd

Add test

0ae7c23

Fix type check

928052b

compulim merged commit 3011d97 into microsoft:main Jul 7, 2021

compulim deleted the fix-prime-speech branch July 7, 2021 14:29

This was referenced Jul 7, 2021

p-defer is impacting ES5 build #3977

Closed

Use p-defer-es5 instead #3978

Merged

[Speech] Direct Line Speech should continue to work in environments without Web Audio #3979

Closed

Skip custom MicrophoneAudioConfig for unsupported hosts #3980

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Web Audio priming/blessing #3974

Add Web Audio priming/blessing #3974

compulim commented Jul 6, 2021 •

edited

Loading

Add Web Audio priming/blessing #3974

Add Web Audio priming/blessing #3974

Conversation

compulim commented Jul 6, 2021 • edited Loading

Changelog Entry

Added

Fixed

Description

Design

Specific Changes

Review Checklist

compulim commented Jul 6, 2021 •

edited

Loading